refactor: dedup regex/validation/provider code (file-search-on); fix JSON in predicate#143
Merged
Merged
Conversation
Each dialect now owns the full boolean predicate for `elem in jsonArray`, emitting both the element and the array expression instead of relying on the caller to prepend `elem = `. Fixes semantically wrong SQL on MySQL, SQLite, DuckDB, and BigQuery; PostgreSQL semantics unchanged. Ported from cel2sql4j (SPANDigital/cel2sql4j@1835215). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Used the file-search-on MCP server (find_near_duplicates, dead_code, code_graph) to locate duplication and dead code, then consolidated: - regex: extract the shared RE2 ReDoS/unsupported-feature validation from the five dialect regex.go files into dialect/internal/regexsafe.Validate, eliminating subtle per-dialect drift (e.g. only some compiled the pattern). Each dialect keeps only its character-class transform. - validation: extract the shared validateFieldName skeleton into dialect/internal/identsafe.ValidateFieldName; dialects keep their own keyword sets and length limits. - providers: add internal/celprovider.Base implementing the shared types.Provider boilerplate over a schema map with a TypeMapper hook; the flat providers (mysql, sqlite, duckdb, bigquery, spark) embed it. pg keeps its own implementation (nested/composite schema resolution + pool ownership). - dead code: remove the unused top-level escapeJSONFieldName from utils.go (superseded by per-dialect copies) and its orphaned tests. Net ~1,040 fewer lines; behavior preserved (all non-Docker tests pass, lint clean). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related changes, in two commits for easy review:
fix: per-dialect JSON array
inmembership predicate — each dialect nowemits the full boolean predicate for
elem in jsonArray(element + array)instead of relying on the caller to prepend
elem =. Fixes semanticallywrong SQL on MySQL (
JSON_OVERLAPS), SQLite/DuckDB (EXISTS … json_each),and BigQuery (
IN UNNEST(JSON_VALUE_ARRAY(...))); PostgreSQL unchanged.Ported from cel2sql4j@1835215. Already noted under
[Unreleased]in CHANGELOG.refactor: deduplicate code surfaced by file-search-on — used the
file-search-on MCP server (
find_near_duplicates,dead_code,code_graph)to find duplication/dead code, then consolidated it.
Refactor detail
dialect/*/regex.go, each reimplementing the RE2 ReDoS checks with drift (only some compiled the pattern; nested-quantifier / nesting-depth loops differed)dialect/internal/regexsafe.Validate; each dialect keeps only its char-class transformvalidateFieldNameskeletondialect/internal/identsafe.ValidateFieldName; dialects keep their own keyword sets / length limitstypes.Providerboilerplatesinternal/celprovider.Basewith aTypeMapperhook, embedded by mysql/sqlite/duckdb/bigquery/spark (pg keeps its richer nested/composite implementation)escapeJSONFieldNameinutils.goThe regex consolidation also closes a security-consistency gap: the ReDoS
validation is now identical across all dialects rather than drifting per copy.
Net ~1,040 fewer lines; +312 lines of new single-source-of-truth code.
Verification
go build ./...,go vet ./...: cleangolangci-lint run ./...: 0 issuesgofmt: cleanfind_near_duplicates: theregex.goandprovider.goclusters are gone.🤖 Generated with Claude Code